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Abstract 

Let r be a fixed constant and let "H be an /--uniform, D-regular hypergraph 
on N vertices. Assume further that D — > oo as N — > oo and that co-degrees of 
pairs of vertices in % are at most L where L — o(D/ log 5 N). We consider the 
random greedy algorithm for forming a matching in %. We choose a matching 
at random by iteratively choosing edges uniformly at random to be in the 
matching and deleting all edges that share at least one vertex with a chosen 
edge before moving on to the next choice. This process terminates when there 
are no edges remaining in the graph. We show that with high probability the 
proportion of vertices of % that are not saturated by the final matching is at 
most (L/Z)) 2 ( T '- 1 > +0< - 1 ' ) . This point is a natural barrier in the analysis of the 
random greedy hypergraph matching process. 



1 Introduction 

Let r be a fixed constant and let T~L be an r-uniform, D-regular hypergraph on 
N vertices where r is a fixed constant and D — > oo as N — > oo. We study the 
evolution of the random greedy matching algorithm on T~L. This process forms a 
matching (i.e. a collection of pairwise disjoint edges) in Ti by making a series of 
random choices. We begin with A4(0) = and %(0) = %. In iteration i an edge 
Ei is chosen uniformly at random from H(i — 1) and added to M(i — 1) to form 
the matching M(i). We then form by deleting from %{i — 1) all edges that 
intersect E^. The process proceeds until the step M where H(M) is empty. We are 
interested in the likely value of M; that is, we are interested in the number of edges 
in the matching produced by the random greedy process. 

The random greedy packing algorithm for producing a partial Steiner system 
is an important special case of this process. Let 1 < I < k be fixed integers. Define 
Ti^k to be the hypergraph on vertex set (^) with edge set consisting of all sets 
of the form (4) where A 6 (^)- Note that a matching in Tii k corresponds to a 
collection of fc-element subsets of [n] with the property that the intersect of any pair 
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of sets in the collection has cardinality less than £; that is, a matching in Hg^ gives 
a partial (n, k, ^)-Steiner system. The random greedy matching algorithm applied 
to %i t k is also known as random greedy packing. This process is related to the 
celebrated Rodl nibble [10], which is a semi-random variation on random greedy 
packing. The Rodl nibble was introduced in the solution of the Erdos and Hanani 
conjecture [6j, which states that for every fixed £, k there is a matching in Jii^ that 
saturates (1 — o(l))(") vertices. 

In this paper we study the general random greedy matching algorithm by 
establishing dynamic concentration of the number of edges and the vertex degrees 
in the remaining hypergraph Let Q{i) be the number of edges in T~L{i) and let 

d v (i) be the degree of vertex v in Tl{i). We aim to show that that Q(i) and d v {i), 
appropriately scaled, are tightly concentrated around expected trajectories that we 
express as smooth functions on the reals. In order to describe the trajectories we 
introduce a continuous time t which we relate to the steps of the process by setting 



Our study is guided by the following probabilistic intuition: we suspect that H(i) 
resembles a subhypergraph of % chosen uniformly at random from the collection of 
all subhypergraphs induced by N — ir vertices. So we anticipate that Tl(i) resembles 
a subhypergraph of % induced by a random subset of the vertices where each vertex 
is included independently with probability 

p = 1 — ir/N = 1 — rt. 

(Note that this probability can be viewed as either a function of either i or t; we pass 
between these interpretations without comment.) It follows from this assumption 
that the probability an edge E G % is in H(i) should be about p r , and therefore we 
ought to have 

Q(i) « \7i\p r = NDp r /r. (1) 
Furthermore, if a vertex v is not saturated by M{i) then we should have 

d v (i) « Df-\ (2) 

Our main result (see Theorem 12.11 below) is that estimates ([TJ and (|2|) hold for 
most of the evolution of the process. This is a generalization of a recent result of 
Bohman, Frieze and Lubetzky [3], who proved an analogous result for the special 
case of %2,3- 

In order to discuss our main result in more detail, we define the random variable 

X = X(n) := 1 - Mr/N 

where M is the number of steps before the random greedy matching algorithm on 
T~L terminates. In other words, X is the proportion of vertices left unsaturated by 
the matching produced by the random greedy algorithm. The following bound is a 
Corollary of Theorem 12.11 
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Theorem 1.1. Let r > 2 and % be an r-uniform, D-regular hypergraph on N 
vertices. If the maximum co-degree L of a pair of vertices in T~L satisfies L = 



matching produced by the random greedy algorithm then with high probability we 
have 



Previous analyzes of the random greedy matching algorithm due to Spencer [T2] 
and, independently, Rddl and Thoma |10] showed that if L = o(D) then we have 
X(H) = o(l) with high probability. Note that this result applied to the hypergraph 
T~Ll,k gives an alternate proof of the Erdos-Hanani conjecture. Wormald [15] applied 
the differential equations method for random graph processes to show that if T~L is 
an r-uniform, D-regular hypergraph on N vertices such D = o(N) but D — > oo 

sufficiently quickly as N — > oo then X(%) < D~ Mr-i)+3+°( 1 ) with high probability. 

We note that Theorem 12.11 takes the analysis of random greedy matching up 
to a natural barrier. To describe this barrier we assume estimates ([I]) and ([2]) 
hold. For a fixed vertex v let L v be the set of vertices u such that the co-degree 
of u and v in Ti is L. Note that \L V \ can be as large as D/L. Now early in the 
process (when p = 1/2, say) the expected number of vertices in L v that are not 
saturated by M can be as large pD/L and thus can have variation as large as 
y/D/L, roughly speaking. This yields variations in vertex degrees that are as large 
as y/D/L ■ L = y/DL. If these early variations in vertex degree persist then at the 



point when Dp r = y/DL these variations will be as large as the expected degree 

itself. So, if these variations indeed persist then when we reach this point vertex 
degrees could be zero even though the expected vertex degree is large. Note that 
this is point where Theorem 12.11 no longer holds. One would expect that in order 
to prove better bounds one would have to show that the variations in vertex degree 
decrease as the process evolves. 

But where do we expect the random greedy matching algorithm to finally ter- 
minate? If we assume that estimates ([I]) and ([2]) hold all the way to termination 
then when NDp r = Np the number of unsaturated vertices should be roughly the 
same as the number of remaining edges. At this stage a positive proportion of the 
unsaturated vertices should be in no remaining edges; these vertices would remain 
unsaturated to termination. Thus, it is natural to guess that random greedy match- 
ing terminates when the proportion of unsaturated vertices is roughly D~ l l( T ~ l \ 
(We note in passing that this line of reasoning is suspect if L > D r ~ 1 . In this 
case, one suspects that we will reach a point where co-degree in are larger than 
degrees before the supposed termination point.) In the context of random greedy 
packing, this line of reasoning leads to the following conjecture. 

Conjecture 1.2 (folklore). Let 1 < £ < k be fixed. With high probability 



o(D/ log 5 



N) and X(T~L) is the proportion of vertices that are not saturated by the 





X(Hi :k ) = n 




-i 
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The I = 2,k = 3 case of this conjecture was recently proved by Bohman, Frieze 
and Lubetzky [3] who establish estimates for vertex degrees in 7^2,3 (£) with error 
bounds that decrease as the process evolves. These self-correcting estimates are 
proved using the critical interval method that is featured in this paper and was 
introduced in [3J. It should be noted that the sharp result given in [I] requires a 
large, carefully selected ensemble of random variables. 

The related problem of proving the existence of a large matching in an r- 
uniform, D-regular hypergraph H has been widely studied (see [S] pQ [8]). The best 
known results are due to Vu [H] who used a semi-random (i.e. Rodl nibble type) 
method to show that there exists a matching in % that saturates all but at most 

vertices where L is the maximum co-degree of pairs of vertices in T~L. Vu obtained 
stronger results when one adds co-degree assumptions for larger sets of vertices. 

The remainder of this paper is organized as followed. In the next Section we 
give a precise statement of our dynamic concentration result. The proof follows in 
Section[3j This proof uses the critical interval method introduced by Bohman, Frieze 
and Lubetzky in [3J, where they prove Theorem 11.11 for the special case %2,3- in 
this note we show that the techniques introduced in [3J are robust enough to handle 
the general case (with the introduction of some delicate calculations necessitated 
by the large co-degrees). 



2 Dynamic Concentration 

Throughout this Section we assume that Ti is an r-uniform, D-regular hypergraph 
on TV vertices where r is a fixed constant and D — > oo as N — > oo. We also assume 
that the maximum co-degree L of a pair of vertices in 7-L satisfies L = o(D/ log 5 N). 

In order to make the estimates (pQ) and ([2]) precise we introduce error bounds 
for Q and d v . Define 

e q = 15NLp 2 ~ r (logN) (1 -rlogp) 2 



ed = a/ 6rLD log N (1 — r log p) 
Further define the stopping time T to be the first step i such that 



Q(i) p 

r 



> e q , or 



\d v (i) — Dp r x | > ed for some v G V(i) 
Theorem 2.1. With high probability we have 



N -Tr = 0\N ■ [ — ) log 2 ^- 1 ) N 
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3 Proof 



For each variable V and each bound (i.e. upper and lower) we introduce a critical 
interval Iy = [ay, fry]. This interval varies with time and has one endpoint at 
the bound we are trying to establish with the other slightly closer to the expected 
trajectory. We only track V if and when it enters a critical interval. If V enters the 
critical interval at step j we 'start' observing a sequence of random variables that 
is designed to be a sub- or supermartingale with the property that if V eventually 
passes all the way through the interval (and thereby violates the bound in question) 
then this martingale has a large variation. When working with the lower bounds 
we consider the sequence V — ay . This sequence should be a submartingale with 
initial value (i.e. value at step j) roughly by — ay. Note that this sequence becomes 
negative when the bound in question is violated. Similarly, when working with the 
upper bound we consider the sequence by — V, which should be a supermartingale. 
The event that V ever violates one of the stated bounds is then the union over all 
'starting' points j of the event that one of the martingales that start at this point 
has a large variation. We prove Theorem 12. II by an application of the union bound, 
taking the union over all variables V of this union over all 'starting' points for both 
the upper and lower bounds. 

The reason that we focus our attention on these critical intervals is fact that the 
expected one-step changes in the variables we consider have self-correcting terms. 
These terms introduce a drift back toward the expected trajectory when V is far 
from the expected trajectory. By restricting our attention to the critical intervals 
we make full use of these terms. See [13] and [5] for early applications of this 
self-correcting phenomena in applications of the differential equations method for 
proving dynamic concentration. As we noted above, the critical interval method we 
use here was introduced in [3]. 

We close this section with some notation conventions and a Lemma that we 
use below. For an arbitrary random variable V we define 

AV(*) = V(* + l)-V(i). 

We let Ti be the filtration of the probability space given by the first i edges chosen 
by the random greedy matching process. 

Lemma 3.1. Suppose (xi)i^i and (jjijiei « r e real numbers such that \xi — x\ < 5 
and \yi — y\ < e for all i E I. Then we have 

< 2\I\5e 




Proof. The triangle inequality gives 



^2(xi -x)(yi -y) 



iei 



< \I\Se. 
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Rearranging this inequality gives 

^2 Xiyi = x^yi + y^Xi- \I\xy ± \I\5e 

iei iei iei 



1 1 Vie/ / Vie/ 





3.1 Vertex degrees 

Let f be a fixed vertex. As is usual in applications of the differential equations 
method for establishing dynamic concentration, we begin the expected one-step 
change in d v (i.e. we begin with the trend hypothesis). We have 

E[Ad v (i)\T i } = -± Yl E d u (i)±d v (i)( r ^, 

^ EeH(i):veEueE\{v} V / 

where Ti is the filtration defined by the random greedy matching process. 

We begin with the upper on d v . Our critical interval is 

[Dp r - l + e d -f d ,Dp r - l + e d }. 

The function e d is define above and the function f d will be determined below. For 
each step j of the process we define the sequence of random variables 

d+- (i) := d v {i) - Dp"' 1 - e d (t) for i > j 

with the stopping time Tj defined to be the minimum of T, j, and the smallest 
index i > j such that d v (i) is not in the critical interval. Note that if d v (j) is not in 
the critical interval then we simply have Tj = j. We prove dynamic concentration 
by considering the sequence of random variables d^j(j), . . . ,d^j{Tj). We chose e d 
(with foresight) so that this sequences is a supermartingale with respect to the 
natural filtration Ti. For j < i < Tj we have 



E 



^ EeH(i):veE ueE\{v} 



d 



Ld v D r _cj 1 a 

~Q- + N2 P + N2 G d 

(Dp r - l + e d - f d ){r-l){Dp r - l -e d ) Dr(r - 1) r _ 2 
< _^ i — — -\ p' 

NDp r /r + e q N 



± e > +0 ^ + —v r ~ 3 + — e" 
N €d + °\ Q + N* P + N 2 ^ 



r(r-l) _ 1 , 



(e d - f d )e d e q Ld^ D r „ 3 1 
NDp r N 2 p 2 Q N 2P N 2 ' 
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Note that we use the assumption that d v (i) lies in the critical interval. Also note 
that in order to get the desired supermartingale condition it is necessary to choose 
fd so that 

4 > ±^21 u (3) 

P 

(Of course, this equation also guided the choice of e^.) A function that satisfies 
this equation is also sufficient to give the supermartingale condition as, assuming 
the given error functions e d , e q , we have 

(e d ~ fd)e d e q Ldv D_ r _ 3 , 1 // 
NDp r N 2 p 2 Q N 2P N 2 

<^L. (^- + ^ + ± + ^-f- 2 + 1 —) (4) 

~ Np \ D ^ e d Np^ ea Ne/ T NplogN J VJ 



( J d 



\/Z(logiV) 3 /V _r \ e d ( VZ 



Np \ y/D Np \y/D NVZ VW 



(We note that these estimates make repeated use of the simple inequality D < NL.) 
By assuming that p is a sufficiently large constant times 

L \ 



\ log 2 ^- 1 ) N 

we see that the expression in can be made smaller than any constant times 
e d / (NplogN). For the time being we assume that a function f d satisfying ([3]) can 
be chosen. Thus, the supermartingale condition is satisfied. 

We use a supermartingale inequality to bound the probability that the random 
variable -(Tj) is positive. The lemma we use is as follows: 

Lemma 3.2. Let X(i) be a supermartingale, such that —0 < AX(i) < 9 for all i, 
where < ^ . Then for any a < 9m we have 

/ r 2 

Pr(X(m) - X(0) >a)< exp 



39Qm . 

Since d v is non-increasing, Dp r ~ 1 is decreasing and e d is increasing, the one step 
change in d£- is bounded above by the one step change in Dp 1 "" 1 , which is at most 



D(r-l) (1 



y + 

For a lower bound on Ad^" -, note that the one step change in e d is negligible com- 
pared to the maximum possible one step change in d v , which occurs when we pick an 
edge containing a vertex that has codegree L with v. So we can set O = rL(\+o(\)). 

Now, if d v crosses the upper boundary of its critical interval at the stopping 
time T, then there is some step j (with T = Tj) such that 

d+jU) < -fMj)) + ^^(i + o(i)) 
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and dyj(Tj) > 0. Applying the lemma (and assuming D/N = o(fd)) we see that 
the probability of the supermartingale having such a large upward deviation 
has probability at most 



As there are 0(N 2 ) such supermartingales, we would like the above expression to 
be o(N~ 2 ). Thus, it suffices to take 



Furthermore this choice also satisfies ([3]). (Note that, in fact, this condition together 
with (|3|) essentially determines the error functions e^.) 

Thus, the probability that T is less than bound stated in Theorem 12.11 due to 
a violation of the upper bound on d v is goes to zero as N tends to infinity. 

The lower bound for d v is similar. 
3.2 Number of edges 

We again begin with the trend hypothesis. We have 



h = V^LD log N. 



E[AQ(i)\Ti\ 



1 



1 



Q 



AeH(i) veA 



Q 



v£V(i) 



For i < T we have 



v€V(i) 




by an application of Lemma 13. 1\ and 



therefore 




We work with the upper bound on Q(i). Our critical interval is 




The function e q is define above and the function f q will be determined below. For 
each step j of the process we define the sequence of random variables 



with the stopping time Tj defined to be the minimum of T, j, and the smallest 
index i > j such that Q(i) is not in the critical interval. We begin by showing that 
Qj~(j), • • • , Qj(Tj) is a supermartingale. For j < i < Tj we have 



E 



r2 Q , „ n .-l 1 j , 2ATp e 2 / , £>_ 2 , 1 



Np ' N^ q ' Q ' ~ y~ ' ^ ' 

< r 2 (e g - f q ) _ 1 , (2r + (l)y- r eg 
Ap A 9 L> 

+ o( i + ^- + i,< 

In order to get the supermartingale condition this requires, up to constant facts, 
e q > e^Np 2 ~ r j D. Note that this determines the main terms in the choice of e q 
above. We set 

f q = NL log Np 2 - r . 

Then we have 

r 2 (e q -f q ) (2r + o(l)y-^ , 2 
Np D P (logA)(l - rlogp) . 

This clearly dominates the remaining error terms (note that e' q > 0) and therefore 
the sequence Q + (j) ■ ■ ■ Q + (Tj) a supermartingale. 

Now we apply the Hoeffding-Azuma inequality to bound the probability that 
the random variable Q + (Tj) is positive. Since i < T implies bounds on degrees, we 
have 

|AQ+| = 0(e d ) = O^LD log N(l - logp)). 

Thus, if Q crosses its upper boundary at the stopping time T, then there is some 
step j (with T = Tj) such that 

Q + U) < fMJ)) + 0(VLDlog i / 2 N) 

and Q + (Tj) > 0. Applying the Hoeffding-Azuma we see that the probability of the 
supermartingale Q + having such a large upward deviation has probability at most 



r / (jVLiogjVp 2 -^) 2 \ 1 

6XP 1 \(Np)(LD\ogN(l - logp) 2 )) j 



( NL log Np 



3-2r 



r— 1> 



where p = p(j). As there are at most O(N) such supermartingales, the probability 
that T is less than the bound stated in Theorem 12. II due to Q(i) breaching the upper 
bound tends to zero as A tends to infinity. 

The lower bound for Q is similar. 
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